Basic Information

Column

Introduction

In this dashboard, we summarize the information from the data provided by Ohio Department of Health.

In this data set, there are 8 variables.

  • County: 88 counties
  • Sex: Female, Male, Unknown
  • Age Range: 0-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80+, Unknown
  • Onset Data
  • Date of Death
  • Case Count
  • Death Count
  • Hospitalized Count


Hospitalized Cases in Healthcare Zones

  • Zone 1 (Cleveland Area): 758 Cases (31.7%)
  • Zone 2 (Columbus Area): 176 Cases (18.41%)
  • Zone 3 (Cincinnati/Dayton): 170 Cases (24.43%)


Death Cases in Healthcare Zones

  • Zone 1 (Cleveland Area): 79 Cases (3.3%)
  • Zone 2 (Columbus Area): 16 Cases (1.67%)
  • Zone 3 (Cincinnati/Dayton): 24 Cases (3.45%)

Column

Summary Statistics

Today: April 05, 2020

The latest onset date is April 04, 2020.

  • Total Number of Confirmed Cases: 4043
  • Total Number of Hospitalizations: 1104
  • Total Number of Deaths: 119

Last Updated: Sun Apr 05 18:04:39 2020

Age Distribution

Sex Distribution

Daily Cases

Column

Apr

Mar

Feb

Column

Distribution of Daily Cases

Distribution by Counties

Distribution by Age

Column

Distribution of Confirmed Cases by the Age Range

We excluded 3 people whose age is unknown.


Column

Distribution of Death Cases by Age

We excluded 3 people whose age is unknown.


---
title: "Ohio COVID-19"
author: "Ying-Ju Tessa Chen"
output: 
  flexdashboard::flex_dashboard:
    theme: journal
    orientation: columns
    social: ["facebook", "twitter", "linkedin"]
    source_code: embed
---




```{r setup, include=FALSE}
library(flexdashboard)  ## you need this package to create dashboard
```

Basic Information
=======================================================================
Column  {data-width=400}
---
  
### Introduction
In this dashboard, we summarize the information from the data provided by Ohio Department of Health. 

In this data set, there are 8 variables. 

- **County**: 88 counties
- **Sex**: Female, Male, Unknown
- **Age Range**: 0-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80+, Unknown 
- **Onset Data**
- **Date of Death**
- **Case Count**
- **Death Count**
- **Hospitalized Count**

  
```{r}
# load necessary packages
library(data.table)
library(ggplot2)
library(ggmap)
library(plotly)
library(plyr)
library(chron)
library(Hmisc)
library(stringr)
library(sp)
```
  
```{r}
df <- fread("https://coronavirus.ohio.gov/static/COVIDSummaryData.csv")
colnames(df) <- c("County", "Sex", "Age_Range", "Onset_Date",         
                  "Date_Of_Death", "Case_Count",        
                  "Death_Count", "Hospitalized_Count")

# remove the last row that shows the total count and make sure the type of each variable is correct                 
df <- as.data.frame(df[1:(nrow(df)-1),])
df[,1:3] <- lapply(df[,1:3], factor)
df[,4:5] <- lapply(df[,4:5], function(x)  as.Date(x, "%m/%d/%Y"))
df[,6:8] <- lapply(df[,6:8], as.numeric)

R1 <- c("Williams", "Defiance", "Paulding", "Van Wert", 
        "Mercer", "Fulton", "Henry", "Putnam", "Allen",
        "Auglaize", "Lucas", "Wood", "Hancock", "Ottawa",
        "Sandusky", "Seneca", "Erie", "Huron")
R2 <- c("Lorain", "Cuyahoga", "Geauga", "Lake", "Ashtabula")
R3 <- c("Darke", "Preble", "Shelby", "Miami", "Montgomery", 
        "Champaign", "Clark", "Greene")
R4 <- c("Crawford", "Delaware", "Fairfield", "Fayette", 
        "Franklin", "Hardin", "Knox", "Licking", "Logan",
        "Madison", "Marion", "Morrow", "Pickaway", "Union", "Wyandot")
R5 <- c("Richland", "Ashland", "Medina", "Wayne", "Holmes",
        "Summit", "Stark", "Tuscarawas", "Trumbull", "Portage",
        "Mahoning", "Columbiana", "Carroll")
R6 <- c("Butler", "Hamilton", "Warren", "Clermont", 
        "Clinton", "Highland", "Brown", "Adams")
R7 <- c("Ross", "Pike", "Scioto", "Hocking", "Vinton",
        "Jackson", "Lawrence", "Athens", "Meigs", "Gallia")
R8 <- c("Coshocton", "Muskingum", "Perry", "Morgan",
        "Guernsey", "Noble", "Washington", "Harrison",
        "Belmont", "Monroe", "Jefferson")

Zone1 <- c(R1, R2, R5)
Zone2 <- c(R4, R7, R8)
Zone3 <- c(R3, R6)
df$Zones <- c(NA)
df$Zones <- ifelse(df$County%in%Zone1, 1, df$Zones)
df$Zones <- ifelse(df$County%in%Zone2, 2, df$Zones)
df$Zones <- ifelse(df$County%in%Zone3, 3, df$Zones)
Hospitalized_Zones <- table(df$Zones, df$Hospitalized_Count)
Hospitalized_Zones_Cases <- apply(Hospitalized_Zones, 1, function(x) sum(x*as.numeric(colnames(Hospitalized_Zones))))
Cases_Zones <- table(df$Zones, df$Case_Count)
Cases_Zones_all <- apply(Cases_Zones, 1, function(x) sum(x*as.numeric(colnames(Cases_Zones))))
Death_Zones <- table(df$Zones, df$Death_Count)
Death_Zones_Cases <- apply(Death_Zones, 1, function(x) sum(x*as.numeric(colnames(Death_Zones))))
```
\

**Hospitalized Cases in Healthcare Zones**

- **Zone 1 (Cleveland Area):** `r unname(Hospitalized_Zones_Cases[1])` Cases (`r round(Hospitalized_Zones_Cases[1]/Cases_Zones_all[1]*100, 2)`%)
- **Zone 2 (Columbus Area):** `r unname(Hospitalized_Zones_Cases[2])` Cases (`r round(Hospitalized_Zones_Cases[2]/Cases_Zones_all[2]*100, 2)`%)
- **Zone 3 (Cincinnati/Dayton):** `r unname(Hospitalized_Zones_Cases[3])` Cases (`r round(Hospitalized_Zones_Cases[3]/Cases_Zones_all[3]*100, 2)`%)

\

**Death Cases in Healthcare Zones**

- **Zone 1 (Cleveland Area):** `r unname(Death_Zones_Cases[1])` Cases (`r round(Death_Zones_Cases[1]/Cases_Zones_all[1]*100, 2)`%)
- **Zone 2 (Columbus Area):** `r unname(Death_Zones_Cases[2])` Cases (`r round(Death_Zones_Cases[2]/Cases_Zones_all[2]*100, 2)`%)
- **Zone 3 (Cincinnati/Dayton):** `r unname(Death_Zones_Cases[3])` Cases (`r round(Death_Zones_Cases[3]/Cases_Zones_all[3]*100, 2)`%)


Column {data-width=600}
---

```{r}
all_dates <- names(table(df$Onset_Date))
latest_date <- sort(df$Onset_Date, decreasing = TRUE)[1]
```

### Summary Statistics
**Today: `r format(Sys.Date(), "%B %d, %Y")`**

**The latest onset date is `r format(latest_date, "%B %d, %Y")`.**

- Total Number of **Confirmed Cases**: `r sum(df$Case_Count)`
- Total Number of **Hospitalizations**: `r sum(df$Hospitalized_Count)`
- Total Number of **Deaths**: `r sum(df$Death_Count)`


**Last Updated:** `r date()`

### Age Distribution

```{r}
AGE_summary <- table(df$Age_Range)
AGE_count <- as.vector(unname(AGE_summary))
AGE <- data.frame(age=AGE_count, percent=paste0(round(AGE_count/sum(AGE_count)*100, 2), "%"))
rownames(AGE) <- names(AGE_summary)
colnames(AGE) <- c("Count", "Percent")
DT::datatable(t(AGE), options = list(
 columnDefs = list(list(className = 'dt-center', targets = 0:nrow(AGE)))
))
```


### Sex Distribution

```{r}
Sex_summary <- table(df$Sex)
Sex_count <- as.vector(unname(Sex_summary))
SEX <- data.frame(sex=Sex_count, percent=paste0(round(Sex_count/sum(Sex_count)*100, 2), "%"))
rownames(SEX) <- names(Sex_summary)
colnames(SEX) <- c("Count", "Percent")
DT::datatable(t(SEX), options = list(
 columnDefs = list(list(className = 'dt-center', targets = 0:nrow(SEX)))
))
```

Daily Cases
=======================================================================

Column {.tabset data-width=500}
-----------------------------------------------------------------------

```{r}
date_sum <- table(df$Onset_Date, df$Case_Count)
daily_cases <- apply(date_sum, 1, function(x) sum(x*as.numeric(colnames(date_sum))))

monthly <- data.frame(dates=as.Date(all_dates, "%Y-%m-%d"), cases=daily_cases)
rownames(monthly) <- c()

cal <- function(month, year) {
    if(missing(year) && missing(month)) {
      tmp <- month.day.year(Sys.Date())
      year <- tmp$year
      month <- tmp$month
    }
    
    if(missing(year) || missing(month)){  # year calendar
      if(missing(year)) year <- month
      par(mfrow=c(4,3))
      tmp <- seq.dates( from=julian(1,1,year), to=julian(12,31,year) )
      tmp2 <- month.day.year(tmp)
      wd <- do.call(day.of.week, tmp2)
      par(mar=c(1.5,1.5,2.5,1.5))
      for(i in 1:12){
        w <- tmp2$month == i
        cs <- cumsum(wd[w]==0)
        if(cs[1] > 0) cs <- cs - 1
        nr <- max( cs ) + 1
        plot.new()
        plot.window( xlim=c(0,6), ylim=c(0,nr+1) )
        text( wd[w], nr - cs -0.5 , tmp2$day[w] )
        title( main=month.name[i] )
        text( 0:6, nr+0.5, c('S','M','T','W','T','F','S') )
      }
      
    } else {  # month calendar
      
      ld <- seq.dates( from=julian(month,1,year), length=2, by='months')[2]-1
      days <- seq.dates( from=julian(month,1,year), to=ld)
      tmp <- month.day.year(days)
      wd <- do.call(day.of.week, tmp)
      cs <- cumsum(wd == 0)
      if(cs[1] > 0) cs <- cs - 1
      nr <- max(cs) + 1
      par(oma=c(0.1,0.1,4.6,0.1))
      par(mfrow=c(nr,7))
      par(mar=c(0,0,0,0))
      for(i in seq_len(wd[1])){ 
        plot.new()
        #box()
      }
      day.name <- c('Sun','Mon','Tues','Wed','Thur','Fri','Sat')
      for(i in tmp$day){
        plot.new()
        box()
        text(0,1, i, adj=c(0,1))
        if(i < 8) mtext( day.name[wd[i]+1], line=0.5,
                         at=grconvertX(0.5,to='ndc'), outer=TRUE ) 
      }
      mtext(month.name[month], line=2.5, at=0.5, cex=1.75, outer=TRUE)
      #box('inner') #optional 
    }
}
week_days <- function(x){
  days <- c(1:7)
  names(days) <- c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")
  days_index <- which(names(days)==x)
  return(unname(days[days_index]))
}
  
```

```{r , message=FALSE, echo=FALSE, cache=TRUE, error=FALSE, results='asis'}

for (i in month(latest_date):2){
    df_m <- monthly[which(month(monthly$dates)==i),]
    first_day <- weekdays(as.Date(paste0("2020-", i, "-01"), "%Y-%m-%d"))
    C_matrix <- matrix(NA, ncol=3, nrow=monthDays(as.Date(paste0("2020-", i, "-01"))))
    total_days <- week_days(first_day):(week_days(first_day)+monthDays(as.Date(paste0("2020-", i, "-01")))-1)
    C_matrix[,1] <- ceiling(total_days/7)
    C_matrix[,2] <- total_days%%7
    C_matrix[,2] <- ifelse(C_matrix[,2]==0, 7, C_matrix[,2])
    for (j in 1:nrow(df_m)){
      C_matrix[mday(df_m$dates[j]),3] <- df_m$cases[j]
    }

    cat('### ', month.abb[i],' \n')
    cal(i, 2020)
    for (k in mday(df_m$dates)){
        par(mfg=C_matrix[k,1:2])
        text(.5, .5, as.character(C_matrix[k,3]), cex=2)
    }
    cat('\n \n')
}
```

Column {data-width=500}
-----------------------------------------------------------------------

### Distribution of Daily Cases

```{r}
D <- data.frame(Dates=names(daily_cases), cases=unname(daily_cases))

p_dates <- plot_ly(D, x=~Dates, y=~cases, type="bar", text=as.character(cumsum(daily_cases)), name="", 
hovertemplate = paste('%{x}', '
Daily Cases: %{y:s}
', 'Total Cases: %{text:s}')) p_dates <- p_dates %>% layout(uniformtext=list(minsize=8,mode='hide')) %>% config(displayModeBar = F) p_dates ``` Distribution by Counties ======================================================================= ```{r} county_cases <- table(df$County, df$Case_Count) county_cases_all <- apply(county_cases, 1, function(x) sum(x*as.numeric(colnames(county_cases)))) df_ohio_cases <- data.frame(county=names(county_cases_all), count=county_cases_all) rownames(df_ohio_cases) <- c() usa <- map_data("county") # get basic map data for all USA counties oh <- subset(usa, region == "ohio") # subset to counties in Ohio oh$county = str_to_title(oh$subregion) my.df = merge(oh, df_ohio_cases, by = "county", all.x = TRUE, sort = FALSE) #my.df$count <- ifelse(is.na(my.df$count), 0, my.df$count) my.df = my.df[order(my.df$order), ] getLabelPoint <- # Returns a county-named list of label points function(county) {Polygon(county[c('long', 'lat')])@labpt} centroids = by(oh, oh$county, getLabelPoint) # Returns list centroids2 <- do.call("rbind.data.frame", centroids) # Convert to Data Frame centroids2$county = rownames(centroids) names(centroids2) <- c('clong', 'clat', "county") # Appropriate Header centroids3 <- merge(centroids2, df_ohio_cases, by="county", all.x=TRUE, sort=FALSE) centroids3$count <- ifelse(is.na(centroids3$count), 0, centroids3$count) centroids3$label <- paste0(centroids3$county,": ", centroids3$count, " Cases") g <- ggplot(centroids3, aes(x = clong, y = clat, group = 1, text = paste0(county, ":
", count, " cases"), )) g <- g + geom_polygon(data=my.df, aes(x=long, y=lat, group=group, fill = count), color="black", size = 0.2) + geom_text(data = centroids3, aes(x = clong, y = clat, label = county), color = "black", size = 4)+ scale_fill_continuous(name="Confirmed Cases", low = "lightblue", high = "darkblue",limits = c(0,max(my.df$count)), na.value = "grey50") + labs(title="Confirmed Cases in Ohio") + theme(legend.position = "none", axis.title.x=element_blank(), axis.text.x=element_blank(), axis.ticks.x=element_blank(), axis.title.y=element_blank(), axis.text.y =element_blank(), axis.ticks.y=element_blank()) ggplotly(g, tooltip = "text") %>% layout(autosize = F, width = 1200, height = 800) ``` Distribution by Age ======================================================================= Column {data-width=500} --- ### Distribution of Confirmed Cases by the Age Range **We excluded `r length(which(df$Age_Range=="Unknown"))` people whose age is unknown.** \ ```{r} # remove the cases for which the age range is "Unknown" if (length(which(df$Age_Range=="Unknown"))==0){ df1 <- df }else{ df1 <- df[-which(df$Age_Range=="Unknown"),] } df1$Age_Range <- factor(df1$Age_Range) # find counts and relative counts (%) in each age range Age_Dist <- table(df1$Age_Range, df1$Case_Count) n <- sum(apply(Age_Dist, 1, function(x) sum(x*as.numeric(colnames(Age_Dist))))) Age_Percent <- round(apply(Age_Dist, 1, function(x) sum(x*as.numeric(colnames(Age_Dist))))/n*100,2) # form a data frame for the summary information of AGE df_age <- data.frame(Age_Range=levels(df1$Age_Range), Percent_Cases=Age_Percent, text1=paste0(Age_Percent, "%")) # obtatin the bar chart for the distribution of Ohio's confirmed cases by the Age Range p_age <- plot_ly(df_age, x=~Age_Range, y=~Percent_Cases, type="bar", text = df_age$text1, textposition = 'outside')%>% config(displayModeBar = F) p_age <- p_age %>% layout(title="Ages of Ohio's Confirmed Cases", xaxis=list(title="Age Range"), yaxis=list(title="Percent of Cases")) p_age %>% layout(autosize = F, width = 650, height = 650) ``` Column {data-width=500} --- ### Distribution of Death Cases by Age **We excluded `r length(which(df$Age_Range=="Unknown"))` people whose age is unknown.** \ ```{r} # find death counts and relative counts (%) in each age range Age_Dist_Death <- table(df1$Age_Range, df1$Death_Count) n <- sum(apply(Age_Dist_Death, 1, function(x) sum(x*as.numeric(colnames(Age_Dist_Death))))) Age_Percent_Death <- round(apply(Age_Dist_Death, 1, function(x) sum(x*as.numeric(colnames(Age_Dist_Death))))/n*100,2) # form a data frame for the summary information of AGE df_age_death <- data.frame(Age_Range=levels(df1$Age_Range), Percent_Cases=Age_Percent_Death, text1=paste0(Age_Percent_Death, "%")) # obtatin the bar chart for the distribution of Ohio's confirmed cases by the Age Range p_age_death <- plot_ly(df_age_death, x=~Age_Range, y=~Percent_Cases, type="bar", text = df_age_death$text1, textposition = 'outside')%>% config(displayModeBar = F) p_age_death <- p_age_death %>% layout(title="Ages of Ohio's Death Cases", xaxis=list(title="Age Range"), yaxis=list(title="Percent of Death Cases")) p_age_death %>% layout(autosize = F, width = 650, height = 650) ```